Transcriptome, Ectopic Expression and Genetic Population Analysis Identify Candidate Genes for Fiber Quality Improvement in Cotton

Comparative transcriptome analysis of fiber tissues between Gossypium barbadense and Gossypium hirsutum could reveal the molecular mechanisms underlying high-quality fiber formation and identify candidate genes for fiber quality improvement. In this study, 759 genes were found to be strongly upregulated at the elongation stage in G. barbadense, which showed four distinct expression patterns (I–IV). Among them, the 346 genes of group IV stood out in terms of the potential to promote fiber elongation, in which we finally identified 42 elongation-related candidate genes by comparative transcriptome analysis between G. barbadense and G. hirsutum. Subsequently, we overexpressed GbAAR3 and GbTWS1, two of the 42 candidate genes, in Arabidopsis plants and validated their roles in promoting cell elongation. At the secondary cell wall (SCW) biosynthesis stage, 2275 genes were upregulated and exhibited five different expression profiles (I–V) in G. barbadense. We highlighted the critical roles of the 647 genes of group IV in SCW biosynthesis and further picked out 48 SCW biosynthesis-related candidate genes by comparative transcriptome analysis. SNP molecular markers were then successfully developed to distinguish the SCW biosynthesis-related candidate genes from their G. hirsutum orthologs, and the genotyping and phenotyping of a BC3F5 population proved their potential in improving fiber strength and micronaire. Our results contribute to the better understanding of the fiber quality differences between G. barbadense and G. hirsutum and provide novel alternative genes for fiber quality improvement.


Introduction
Cotton is an important economic crop which provides natural fiber materials for the textile industry [1]. Gossypium hirsutum, the most widely cultivated cotton species, accounts for more than 90% of the global cotton production because of its high yield and wide applicability. Gossypium barbadense, the other cultivated allotetraploid cotton species, is characterized by its superior fiber quality [2,3]. Breeders have been trying hard to develop cotton varieties with high yield, wide adaptability and superior fiber properties by crossing G. barbadense with G. hirsutum, but it is laborious and inefficient owing to the complex trait segregation of hybrid progenies. A more economic and effective strategy is molecular breeding, including molecular marker-assisted selection, transgene and gene editing, which requires a wide identification of fiber development-related genes and accurate verification of their functions.

Overview of Fiber Transcriptomes
In the present study, to identify cotton fiber development-related genes, G. barbadense acc. Pima90-53 and Hai7124 with superior fiber properties and G. hirsutum acc. HY405 and ND13 with good fiber properties but inferior to G. barbadense were selected for comparative transcriptome analysis. Fiber quality analysis showed that the two G. barbadense accessions compared with the two G. hirsutum accessions possessed significantly higher fiber length (i.e., fiber upper-half mean length) and fiber strength and a lower fiber micronaire. Besides that, fiber elongation and fiber uniformity showed no significant differences (Figure 1a-d; Table S1). In total, 28 cDNA libraries from seven timepoints were constructed to perform RNA-seq, including 0, 5, 10, 15, 20, 25 and 30 DPA (Figure 1e). Based on the time course of fiber development [25][26][27][28], 0 DPA (ovule samples) was designated as lint initiation. Moreover, lint elongation was referred to the fiber samples of 5, 10 and 15 DPA, while SCW biosynthesis was characterized using the fiber samples of 20, 25 and 30 DPA. A total of 420, 430, 437 and 414 million clean reads were generated in Pima90-53, Hai7124, HY405 and ND13, respectively. Although multi-mapped reads account for approximately 11% of the clean reads due to the high homology between the At and Dt subgenomes, about 80% of the clean reads were uniquely mapped to the upland cotton TM-1 genome (Table S2). Specifically, the percentages varied from 75.74% to 79.63% in Pima90-53, from 74.50% to 80.03% in Hai7124, from 78.70% to 83.03% in HY405 and from 79.05% to 82.94% in ND13. According to the uniquely mapped reads, gene expression analysis was carried out for each sample, and then we performed principal component analysis (PCA) based on gene expression levels to analyze the correlations between the samples. Analyzed by PC1 and PC2, the samples were classified by development processes regardless of accessions. More specially, the samples of 20, 25 and 30 DPA clustered together, corresponding to the continuous and stable deposition of cellulose during SCW biosynthesis (Figure 1f). Analyzed by PC1 and PC3, the samples of G. barbadense were clearly separated from those of G. hirsutum, suggesting great potential to mine candidate genes for cotton fiber quality improvement by comparative transcriptome analysis.

Preferentially Expressed Genes during Fiber Elongation in G. barbadense
To identify the genes related to cotton fiber elongation and SCW biosynthesis in G. barbadense, differential expression analysis was conducted, which found a total of 17,473 and 16,538 genes (in Pima90-53 and Hai7124, respectively) to be significantly upregulated or downregulated at 5, 10, 15, 20, 25 and 30 DPA as compared with 0 DPA. Cluster analysis divided these differentially expressed genes into six categories based on expression trends ( Figure S1a). By comparison with lint initiation (0 DPA), type I genes were upregulated during lint elongation, type III genes were upregulated during SCW biosynthesis and type II genes showed an upregulated expression at both of these stages. On the contrary, type IV and VI genes exhibited a downregulated expression during lint elongation and SCW biosynthesis, respectively, and type V genes were downregulated at the two stages. Furthermore, the common genes of Pima90-53 and Hai7124 in each category were characterized with GO enrichment analysis to identify the significantly enriched biological processes ( Figure S1b). As a result, GO terms including fatty acid metabolic process, cellular carbohydrate metabolic process, plant type secondary cell wall biogenesis and actin cytoskeleton organization were enriched in the upregulated genes, while the downregulated genes were mainly enriched in GO terms involving cell proliferation and differentiation, such as chromatin organization, DNA metabolic process and meristem development. Obviously, the candidate genes promoting fiber elongation and/or SCW biosynthesis could be easily available from the upregulated genes rather than from the downregulated genes that may affect ovule development and fiber initiation.

Preferentially Expressed Genes during Fiber Elongation in G. Barbadense
To identify the genes related to cotton fiber elongation and SCW biosynthesis barbadense, differential expression analysis was conducted, which found a total of 1 and 16,538 genes (in Pima90-53 and Hai7124, respectively) to be significantly upregu or downregulated at 5, 10, 15, 20, 25 and 30 DPA as compared with 0 DPA. Cluster an divided these differentially expressed genes into six categories based on expression t ( Figure S1a). By comparison with lint initiation (0 DPA), type I genes were upregu during lint elongation, type III genes were upregulated during SCW biosynthesis and II genes showed an upregulated expression at both of these stages. On the contrary IV and VI genes exhibited a downregulated expression during lint elongation and biosynthesis, respectively, and type V genes were downregulated at the two stages thermore, the common genes of Pima90-53 and Hai7124 in each category were char ized with GO enrichment analysis to identify the significantly enriched biologica cesses ( Figure S1b). As a result, GO terms including fatty acid metabolic process, ce carbohydrate metabolic process, plant type secondary cell wall biogenesis and actin skeleton organization were enriched in the upregulated genes, while the downregu genes were mainly enriched in GO terms involving cell proliferation and differenti such as chromatin organization, DNA metabolic process and meristem developmen viously, the candidate genes promoting fiber elongation and/or SCW biosynthesis  Table S3). Although upregulated during fiber elongation, group I containing 55 genes exhibited higher expression levels during SCW biosynthesis, indicating that these genes may play more important roles in SCW thickening. Correspondingly, these genes were enriched in the carbohydrate transport and cellulose biosynthesis processes, based on GO enrichment analysis ( Figure 2c). The 86 genes of group II were highly expressed during both fiber elongation and SCW biosynthesis, suggesting a solid foundation involved in cell growth. Roughly, the 272 genes of group III and the 346 genes of group IV possessed similar expression trends, which were preferentially expressed during fiber elongation. Group III genes were mainly enriched in actin cytoskeleton organization, carbohydrate catabolic process, response to salt stress and transmembrane transport, indicating a potential role in cell expansion. Unlike group III genes, members in group IV were upregulated only during fiber elongation, not SCW biosynthesis. GO enrichment analysis showed that a lot of fiber elongation-related processes were enriched in group IV genes, including the S-adenosylmethionine metabolic process, very long-chain fatty acid metabolic process and water transport. Furthermore, group IV genes were divided into three subgroups (IV-I, -II and -III), based on the time at which their expression reached the peak (Figure 2b). The IV-I genes exhibited the highest expression at 5 and 10 DPA and were mainly enriched in the S-adenosylmethionine metabolic process, implying that these genes may create initial conditions for cotton fiber elongation. The expression of the IV-II genes peaked at 10 DPA, and the IV-III genes showed the highest expression at 10 and 15 DPA. Hence, these two subgroups might have contributed to the rapid elongation of fiber cells, which was also supported by the significantly enriched GO terms (Figure 2d). not SCW biosynthesis. GO enrichment analysis showed that a lot of fiber elongation-related processes were enriched in group IV genes, including the S-adenosylmethionine metabolic process, very long-chain fatty acid metabolic process and water transport. Furthermore, group IV genes were divided into three subgroups (IV-I, -II and -III), based on the time at which their expression reached the peak (Figure 2b). The IV-I genes exhibited the highest expression at 5 and 10 DPA and were mainly enriched in the S-adenosylmethionine metabolic process, implying that these genes may create initial conditions for cotton fiber elongation. The expression of the IV-II genes peaked at 10 DPA, and the IV-III genes showed the highest expression at 10 and 15 DPA. Hence, these two subgroups might have contributed to the rapid elongation of fiber cells, which was also supported by the significantly enriched GO terms (Figure 2d).  The RPKM values were normalized with the z-score method. (c) GO enrichment analysis in different clusters; p-value generated from the hypergeometric test was adjusted via the Benjamini-Hochberg method, generating a Q-value; GO terms with a Q-value < 0.05 were considered significantly enriched. (d) GO enrichment analysis in the IV-I, IV-II and IV-III subgroups. (e) Forty-two genes were extracted from cluster IV, which were upregulated in G. barbadense as compared with G. hirsutum during cotton fiber elongation.
To further validate the potential roles of the 42 candidate genes in cell elongation, Gbar_A05G037170 and Gbar_A05G020690 driven by a CaMV35S promoter were transferred independently into A. thaliana plants, and then their roles were determined by observing the changes of leaf trichomes and dark-grown hypocotyl cells. The two genes were selected by considering their potential as novel genes related to cotton fiber development. Gbar_A05G037170 (termed GbAAR3) is highly homologous to the AtAAR3 identified in a screen for mutants resistant to an anti-auxin [36]. Encoding a DCN1-like protein, GbAAR3 may regulate cullin neddylation and thus participate in auxin signaling by the SCF TIR1/AFB pathway [37,38]. As expected, transient expression analysis of GbAAR3 fused to green fluorescent protein (GFP) confirmed its nuclear localization (Figure 3a). Two transgenic T 3 lines OE2 and OE5 were developed, in which the stable expression of GbAAR3 was confirmed by qRT-PCR ( Figure 3b). Arabidopsis leaf trichomes were subsequently measured because they partly share common regulatory mechanisms with cotton fiber cells [39][40][41][42]. As a result, the transgenic OE2 and OE5 plants, compared with WT plants, showed significantly longer trichomes (Figure 3c,d). Moreover, dark-grown hypocotyls were employed because their growth results from cell elongation rather than division [43,44]. Here, five-day-old dark-grown OE2, OE5 and WT seedlings were measured, and then we discovered that the transgenic seedlings had significantly longer hypocotyls (Figure 3e,f). Correspondingly, longer hypocotyl epidermal cells were observed in the transgenic seedlings in a microscopic inspection (Figure 3g). Similarly, Gbar_A05G020690 (named GbTWS1) is homologous to AtTWS1 which encodes a novel small protein and is speculated to affect the functionality of the endoplasmic reticulum [45]. To determine the involvement of GbTWS1 in cell elongation, Arabidopsis GbTWS1-overexpressed T 3 lines OE5 and OE7 were generated, and stable expression was confirmed by qRT-PCR and Western blotting ( Figure S2a,b). Although the overexpression of GbTWS1 in Arabidopsis plants promoted the elongation of leaf trichomes only slightly, the transgenic seedlings produced significantly longer dark-grown hypocotyl cells and thus longer hypocotyls in comparison with WT seedlings ( Figure S2c-e). These results indicate that GbAAR3 and GbTWS1 can promote cell elongation and might have contributed to cotton fiber development and that the 42 elongation-related candidate genes may have good prospects in fiber quality improvement.

Preferentially Expressed Genes during SCW Biosynthesis in G. Barbadense
Differential expression analysis produced 2639 and 2568 genes that were upregu at 20, 25 and 30 DPA as compared with 0 DPA in G. barbadense acc. Pima90-53 and Ha respectively (Figure 4a). Subsequently, cluster analysis using gene expression da vided the common 2275 genes into five groups (Figure 4b; Table S5). The 466 gen

Preferentially Expressed Genes during SCW Biosynthesis in G. barbadense
Differential expression analysis produced 2639 and 2568 genes that were upregulated at 20, 25 and 30 DPA as compared with 0 DPA in G. barbadense acc. Pima90-53 and Hai7124, respectively ( Figure 4a). Subsequently, cluster analysis using gene expression data divided the common 2275 genes into five groups (Figure 4b; Table S5). The 466 genes in group I were upregulated during SCW thickening but showed the highest expression during fiber elongation. A lot of GO terms were significantly enriched and mainly classified into five categories: cytoskeleton organization, ATP metabolism, cell wall development, gene expression regulation and transport ( Figure S3). The 329 genes in group II exhibited high expression levels during both fiber elongation and SCW thickening. Interestingly, the group II genes were not enriched in biological processes associated with cell wall development, especially the biomacromolecule metabolic process ( Figure S4). It was obvious that the biosynthesis of PCW and SCW featured distinct gene networks. The 686 genes in group III were preferentially expressed during SCW thickening and were slightly upregulated during fiber elongation. A number of cell wall development-related processes related to cellulose, hemicellulose and lignin were significantly enriched in the group III genes. Correspondingly, we observed a lot of significantly enriched GO terms involving transport, especially vesicle-mediated transport ( Figure S5). The 647 genes in group IV and the 147 genes in group V had similar expression trends, i.e., only upregulated during SCW biosynthesis. The group IV genes were significantly enriched in the metabolic processes of cellulose, xylan, glucuronoxylan, lignin, pectin, arabinan and glycoprotein, which all belong to cell wall macromolecules ( Figure S6a,b). However, being different from the group III genes, the group VI members were hardly enriched in transport-related processes. The group V genes showed the highest expression at 30 DPA and thus were more likely to affect cotton fiber maturation ( Figure S6c-e). In addition, more GO terms involved in regulating gene expression were significantly enriched in the group I and II genes, presumably because the group III and IV genes were mainly responsible for continuous and stable SCW deposition. Obviously, all the five groups contributed to cotton fiber development based on the principle of coordination and unification (Figure 4c).
To further verify the involvement of the 48 candidate genes in fiber quality improvement, we performed the genotyping and phenotyping of a BC 3 F 5 population developed from donor parent G. barbadense acc. Pima90-53 and recipient parent G. hirsutum acc. CCRI8, and then determined whether the introgression of the G. barbadense candidate genes into G. hirsutum could improve fiber properties. For the genotyping, SNP molecular markers of 39 candidate genes were successfully developed, which can distinguish the orthologous genes between G. barbadense and G. hirsutum accurately and easily ( Figure S7). The primers of molecular markers were designed based on the SNPs between G. barbadense and G. hirsutum and the SNPs between the At and Dt subgenomes (Figure 5a; Table S8). Subse-quently, in the BC 3 F 5 population, alleles were detected at each locus, showing three types of genotypes: Gb/Gb homozygote, Gh/Gh homozygote and Gb/Gh heterozygote (Figure 5b). The phenotype data collected from Luntai (E1) and Qingxian (E2) were thus compared based on different genotypes. Interestingly enough, the type I SCW biosynthesis-related candidate genes enhanced fiber strength and did not impair fiber length and micronaire while the type II candidate genes were much more likely to improve fiber micronaire (Figures 5c and S8). Gbar_A05G019310 belonging to type I shows similarity to AtIDD1 that encodes a C2H2 zinc finger protein and mediates GA signaling. GA has been proposed to promote SCW deposition in cotton fiber cells [53]. Here, Gbar_A05G019310 was preferentially expressed during SCW biosynthesis in G. barbadense, and its orthologous gene in G. hirsutum exhibited similar expression patterns but lower transcript levels, supported by the current transcriptome data and a supplementary RNA-seq project of G. hirsutum acc. CCRI8 (Figure 5d; Table S6). As expected, it contributed to fiber strength improvement when Pima90-53 genome fragments containing Gbar_A05G019310 were recombined into the CCRI8 genome. Similarly, another type I gene Gbar_A05G025000, encoding a hypothetical transmembrane protein with unknown functions, also enhanced fiber strength (Figure 5e). Comparative phenotype analysis also identified three type II genes Gbar_A06G015550 (Figure 5f), Gbar_A13G023450 ( Figure 5g) and Gbar_D05G026930 (Figure 5h) that improved fiber micronaire significantly. Gbar_A06G015550 encodes the aldehyde dehydrogenase associated with cellular responses to oxidative stress [54]. Encoding the putative epoxide hydrolase, Gbar_A13G023450 is potentially involved in transforming epoxide-containing fatty acids and thereby cutin biosynthesis [55]. Gbar_D05G026930 encodes an uclacyaninlike blue copper-binding protein that has been implicated in lignin biosynthesis [56,57]. These kinds of genes are generally considered to involve stress responses, and thus the results provide a novel alternative way to improve cotton fiber quality. Gbar_D06G000230 β-tubulin HEBAU0018 Gh_D07G0285 Gbar_D07G003310 Clathrin assembly protein NA Gh_D09G0011 Gbar_D09G000120 Protein of unknown function HEBAU0019 Gh_D10G1036 Gbar_D10G011020 Xylan glucuronosyltransferase HEBAU0020 Gh_D10G1767 Gbar_D10G018450 β-xylosidase HEBAU0021 Gh_D12G2377 Gbar_D12G025300 Protein of unknown function NA from the group III genes, the group VI members were hardly enriched in transport-related processes. The group V genes showed the highest expression at 30 DPA and thus were more likely to affect cotton fiber maturation ( Figure S6c-e). In addition, more GO terms involved in regulating gene expression were significantly enriched in the group I and II genes, presumably because the group III and IV genes were mainly responsible for continuous and stable SCW deposition. Obviously, all the five groups contributed to cotton fiber development based on the principle of coordination and unification (Figure 4c).

Identification and Functional Analysis of the SCW Biosynthesis-Related Genes
Given the critical roles of the group IV genes in SCW thickening (Figure 4d), the 647 genes were further used to identify candidate genes improving cotton fiber strength and micronaire. By means of comparative transcriptome analysis, 48 genes were filtered from the group IV genes which were upregulated in G. barbadense as compared with G. hirsutum during SCW biosynthesis (Figure 4e). The 48 candidate genes were subsequently divided  Table 3. Type II SCW biosynthesis-related candidate genes.

Discussion
In the present study, we identified 42 and 48 candidate genes for cotton quality improvement during fiber elongation and SCW biosynthesis, respectively. Among the 42 elongation-related candidate genes, some genes, e.g., α-expansin and pectinesterase, have been widely proposed to affect cotton fiber development (Table 1). Therefore, the identified candidate genes have paved alternative ways for fiber length improvement. Gbar_A13G012640 from IV-I, encoding a DUF538 domain-containing protein, is homologous to AtSVB (AT1G56580), mutants of which exhibit smaller trichomes. Arabidopsis leaf trichomes share similar developmental mechanisms with fiber cells of cotton [39][40][41][42], and thus Gbar_A13G012640 might be responsible for cotton fiber development. ROS (reactive oxygen species) signaling is well-known to be crucial for cell elongation [58]. Strikingly, two highly homologous genes Gbar_A11G014750 and Gbar_D11G015570 encode a cysteine-rich STOMAGEN peptide, showing great potential in ROS perception [59]. The two genes were predominantly expressed during early stages of fiber elongation (Figure 2b), and thus it is conceivable that they might contribute to initiating the elongation process. ROS, at high concentrations, can lead to cell wall stiffening and thus suppress extension [60]. Not surprisingly, two IV-II candidate genes Gbar_D11G024050 (glutathione S-transferase) [61] and Gbar_D12G026630 (heat shock protein) [62], showing the highest expression at 10 DPA, may play a significant role in ROS homeostasis and help to maintain the rapid elongation of fiber cells. In addition, Gbar_A07G023310 from IV-III, encoding formate dehydrogenase, is also involved in scavenging ROS, as it produces NADH [63]. Remarkably, several genes participating in the metabolism of S-adenosylmethionine stood out, including Gbar_A07G013810 (S-adenosylmethionine synthase), Gbar_A09G015680 (S-adenosylmethionine synthase), Gbar_D10G014320 (methylenetetrahydrofolate reductase) and Gbar_A06G013440 (cystathionine β-synthase) [64]. S-adenosylmethionine not only provides the methyl group for the majority of methylation reactions, but also enters the ethylene and polyamine biosynthetic pathways. Obviously, it is worth trying to dissect the roles of the S-adenosylmethionine metabolism-related genes in fiber development. Besides that, the fact that GbAAR3 (from IV-III) promoted cell elongation in Arabidopsis plants raises our interest on cullin neddylation as well as SCF TIR1/AFB -mediated auxin signaling in cotton fiber elongation [37,38] reinforced by a SAUR-like auxin-responsive protein (Gbar_D12G003220 from IV-III).
Among the 48 SCW biosynthesis-related candidate genes, a number of well-known cell wall development-related genes were observed and presented. Here, we discuss the functions of other genes which provide novel potential ways for fiber quality improvement. Unlike vascular cells, cotton fibers contain only a small amount of lignin whose role in fiber development has been neglected. Recent studies, however, have shown that lignin-like phenolics can affect fiber quality substantially in a negative pattern during elongation and in a positive pattern during SCW thickening [65,66]. Type I candidate gene Gbar_A11G034900 encodes laccase that is responsible for the final stage of lignin polymerization [47]. Having higher expression levels than its G. hirsutum ortholog during SCW thickening (Figure 4f), Gbar_A11G034900 may contribute to forming stronger and thinner fiber cells. Uclacyanin proteins have been considered to regulate lignin biosynthesis [56,57], but the mechanisms remain unclear. Hence, the fiber fineness improvement caused by the introgression of type II candidate gene Gbar_D05G026930 encoding an uclacyanin-like blue copper-binding protein could be due to an increase in lignin/lignin-like phenolics (Figure 5h). Recent advances have revealed that plant hormones play a pivotal role in regulating cotton fiber development, mainly focusing on fiber initiation and elongation. Here, type I gene Gbar_A05G019310 shows high homology to AtIDD1 [67,68] and thus is assumed to mediate GA signaling. The fact that Gbar_A05G019310 can enhance fiber strength (Figure 5d) highlights the importance of the IDD (INDETERMINATE DOMAIN) family as well as GA signaling in SCW biosynthesis. Another type I gene Gbar_A05G022250 encodes the EXORDIUM protein that has been proposed to participate in BR signaling [69]. BR signaling appears likely to change the expression of cellulose synthase genes and thus affects SCW deposition in cotton fiber [70]. Intriguingly, we identified two cellulose synthase genes Gbar_A07G004070 and Gbar_D03G007510, suggesting a possible signaling module involved in BRs, EXORDIUM protein and cellulose synthase. The ABC transporter plays a central role in transporting auxin, ABA and cytokinin. Gbar_A10G008340 from type II, encoding the ABCG-type transporter, is homologous to AtABCG40 that exhibits ABA uptake activity [71]. Another type II gene Gbar_A13G001740 encodes the auxin influx transporter that is responsible for importing auxin into cells [38]. Obviously, phytohormones play a vital role in SCW biosynthesis, and their crosstalk and transcriptional regulation may contribute to stronger and thinner fibers in G. barbadense. Hormonal regulation of transcription is involved in the ubiquitin/26S proteasome pathway, such as Aux/IAA degradation in auxin signaling [38] and DELLA degradation in GA signaling [68]. Gbar_A05G004360 from type I encodes the putative F-box component of the SCF E3 ligase complex, while type II gene Gbar_D06G019950 encodes RING-type E3 ligase. It is particularly interesting to investigate whether these two genes act as key regulators of hormone signaling in plants.
In this study, genetic experiments were conducted to further confirm the roles of the candidate genes in cotton fiber development. By means of homology analysis, we speculated that GbAAR3, a type IV-III elongation-related candidate gene, could regulate auxin signaling through the cullin neddylation-mediated SCF TIR1/AFB pathway [36][37][38]. GbTWS1 from IV-II, which is implicated in the functioning of the endoplasmic reticulum, might affect fatty acid biosynthesis and thus regulate the organization of the endomembrane system [45]. The fact that overexpression of GbAAR3 ( Figure 3) and GbTWS1 ( Figure S2) can promote cell elongation in Arabidopsis has increased the importance of candidate genes in fiber length enhancement of cotton. For the SCW biosynthesis-related candidate genes, we developed SNP molecular markers to distinguish the candidate genes from their G. hirsutum orthologs, and then the genotyping and phenotyping of a BC 3 F 5 population were conducted to evaluate the influence on cotton fiber quality generated by the introgression of the candidate genes into G. hirsutum. Although type I candidate genes enhanced fiber strength as a whole, several members showed pleiotropic effects. For instance, Gbar_D06G000230, a β-tubulin gene, exhibited dual functions in improving fiber strength and fiber micronaire ( Figure S8c), presumably because cortical microtubules play a vital role in cellulose microfibril deposition [72]. GA signaling in general promotes both elongation and SCW development in cotton fiber cells [73]. Correspondingly, the proposed GA signaling-related gene Gbar_A05G019310 enhanced not only fiber strength but also length ( Figure S8a). Similarly, the hypothetical transmembrane protein Gbar_A05G025000 also improved fiber length and strength ( Figure S8b). Furthermore, Gbar_A05G019310 fell into FUqQtlc05_1b [74], and Gbar_A05G025000 fell around QTLClust_LEN_5_3 [75] when analyzing the QTLs reported for fiber length. In this light, we can infer that some SCW biosynthesis-related genes also contribute to fiber elongation, especially because a considerable portion of SCW thickening occurs before fiber elongation ceases in both G. barbadense [28] and G. hirsutum [27]. Type II gene Gbar_A06G015550 encoding aldehyde dehydrogenase can oxidize a wide range of reactive and toxic aldehydes and meanwhile produce NAD(P)H, a highly effective antioxidant [54]. Hence, BC 3 F 5 lines with the introgression of Gbar_A06G015550 may enhance the capacity for the management of ROS and thus showed the enhancement of fiber length besides the abovementioned fiber micronaire improvement ( Figure S8d). Interestingly enough, another type II gene Gbar_A13G023450 seemed to simultaneously improve fiber length, strength and micronaire ( Figure S8e). Gbar_A13G023450 encodes epoxide hydrolase and is likely responsible for cutin biosynthesis [55]. Cutin is the main constituent of the cuticle, which is the outermost layer of cotton fibers and provides a physical barrier, but to date, there is little evidence that cutin biosynthesis affects cotton fiber development [76]. More genetic experiments by overexpressing and knocking out Gbar_A13G023450 need to be performed to further confirm its remarkable functions. Taken together, the BC 3 F 5 population analysis proved the importance of the candidate genes in fiber quality improvement and highlighted several pleiotropic genes. Nevertheless, it is worth noting that some candidate genes failed to be characterized because of no or little corresponding lines and that the influence on cotton fiber quality may be caused by candidate genes and their closely linked genes. A larger population will be applied to overcome this problem, and we will investigate the functions of the candidate genes by means of CRISPR/Cas9-mediated gene editing in future efforts.

Plant Materials and Growth Conditions
The cotton plant materials used in this study included G. barbadense acc. Pima90-53 [77] and Hai7124 [3] and G. hirsutum acc. HY405 [35], ND13 [35] and CCRI8. Pima90 The population materials were sown in mid-to late April and were harvested in mid-to late October. All field management including watering, weed management and fertilization was performed according to the local production practices throughout the growth period. A. thaliana (Columbia ecotype) wild-type and transgenic plants were grown in pots containing sterile vermiculite in a greenhouse (22 • C, 16 h photoperiod, 70% relative humidity) at Hebei Agricultural University, with Hoagland's nutrient solution added weekly.

Transcriptome Sequencing
Flowers of G. barbadense acc. Pima90-53 and Hai7124 and G. hirsutum acc. HY405 and ND13 were tagged at the flowering stage (from mid-July to late August). For each accession, ovule samples of 0 DPA and fiber samples of 5, 10, 15, 20, 25 and 30 DPA were collected independently from cotton bolls, frozen immediately in liquid nitrogen and ground mechanically to a fine powder. For each timepoint, samples from multiple cotton plants were pooled to minimize variations. Total RNA was then isolated using an RNAprep pure Plant Kit (TIANGEN, Beijing, China). Finally, 28 cDNA libraries were constructed using a NEBNext Ultra RNA Library Prep Kit for Illumina (NEB, Ipswich, MA, USA) at the Novogene Bioinformatics Institute, Beijing, China. Briefly, mRNA was first purified and fragmented. Secondly, cDNA was synthesized using a random hexamer primer, and the sequencing adaptor was ligated. Thirdly, fragments containing an insert of 150-200 bp were selected, and PCR amplification was performed.
An Illumina Hiseq 2500 platform was then used for the sequencing, and 125 bp pairedend reads were generated. To generate clean reads, raw data were first processed by removing the adapter-or polyN-containing reads and the reads with low quality. The clean reads were then aligned to the G. hirsutum TM-1 genome [29] using TopHat2 (version 2.0.12; Toronto, Ontario) [78] with the threshold of two mismatches, and the mapped reads were assembled by Cufflinks (version 2.1.1; Dallas, Texas, United States) [79]. HTSeq (version 0.6.1; Wilmington, Delaware, United States) [80] was used to count the reads mapped to each gene, and subsequently RPKM (reads per kilobase of exon model per million mapped reads) was applied to estimate the expression levels. After the read counts were adjusted using the edgeR package (version 3.0.8; Wilmington, Delaware, United States) [81], differential expression analysis was performed using the DEGSeq package (version 1.12.0; Wilmington, Delaware, United States) [82], with the criteria of |log 2 (fold change)| > 1 and Q-value < 0.005. GO enrichment analysis was implemented using KOBAS v3.0 [83]. The p-value generated from the hypergeometric test was adjusted via the Benjamini-Hochberg method, generating the Q-value. GO terms with the Q-value < 0.05 were considered significantly enriched.

Functional Analysis of the Elongation-Related Candidate Genes
The functional identification of cotton fiber elongation-related candidate genes in A. thaliana plants was conducted using the methods described previously [35]. To be specific, the coding sequence of candidate genes was cloned from fibers of G. barbadense acc. Pima90-53 and inserted into the Gateway pDONR207 vector to generate an entry clone. The gene clone was then recombined into the Gateway pGWB414 vector to construct a constitutive overexpression (OE) system under the control of a CaMV35S promoter. The OE system was transferred with an Agrobacterium-mediated method into A. thaliana Columbia ecotype plants, and subsequently the transgenic plants were confirmed by kanamycin-resistant selection and PCR detection. To observe the changes of trichomes, the fifth rosette leaves of four-week-old A. thaliana wild-type (WT) and OE plants (T 3 homozygous lines) were harvested, decolorized by ethanol and photographed with an Olympus BX51 microscope (Tokyo, Japan). The longest branch of about 150 legible trichomes was measured using the ImageJ software [84] in each line. To compare dark-grown hypocotyls, the seeds of A. thaliana WT and OE plants were sterilized and then grown in vertical plates (1/2 MS medium, 0.9% agar and pH 5.8) under the conditions of 22 • C and continuous darkness. Five-day-old seedlings were photographed with a professional Epson V800 scanner (Nagano, Japan), and their hypocotyls were then measured using the ImageJ software. Stained with propidium iodide (PI), the seedlings used above were observed using an Olympus FV10i laser scanning microscope (Tokyo, Japan) to determine the epidermal cells of the hypocotyls. The GraphPad Prism software (San Diego, CA, USA) was employed to conduct a two-tailed t-test, and p-values < 0.05 were considered statistically significantly different. To determine the subcellular localization, the gene clone was recombined into the Gateway pEarleyGate103 vector to express the target protein with a C-terminal GFP fusion. The GFP-fused target protein and GFP as a control were transiently expressed in onion epidermal cells by means of a Bio-Rad PDS-1000/He system (Hercules, CA, USA). Incubated on an MS agar medium for 24 h in continuous darkness, the transformed cells were observed and photographed with an Olympus BX51 microscope (Tokyo, Japan).

Functional Identification of the SCW Biosynthesis-Related Candidate Genes
The BC 3 F 5 population was employed to determine the involvement of the SCW biosynthesis-related candidate genes in fiber quality improvement. Twenty mature bolls of each accession (167 BC 3 F 5 lines and their parents) were harvested from the middle fruiting branches of cotton plants, and the phenotyping of fiber quality traits including length, strength, micronaire, elongation and uniformity was performed using 15 g fiber samples on an Uster HVI 1000 system under environmental conditions of 20 • C and 65% relative humidity. For genotyping, fresh leaf tissue of each accession was used for genomic DNA isolation using a modified CTAB method [85]. To differentiate the SCW biosynthesis-related candidate genes from their G. hirsutum orthologs, a simple and cost-effective tri-primer AS-PCR [86] assay was employed based on in silico single-nucleotide polymorphism (SNP) identification. For each candidate gene locus, the 3' end of primers P1 and P2 was designed for the SNP between G. barbadense and G. hirsutum, and the 3' end of a third primer P3, common to both species, was designed for the SNP between the At and Dt subgenomes to avoid effects from highly homologous genes. The G. barbadense-specific primer pair (P1 and P3) and the G. hirsutum-specific primer pair (P2 and P3) were each used for population genotyping. The Gb/Gb homozygote showed PCR products only in P1-P3, the Gh/Gh homozygote showed PCR products only in P2 and P3 and the Gb/Gh heterozygote showed PCR products in both primer pairs. To determine the influence on fiber quality with the introgression of the candidate genes, the phenotype data of accessions with the Gh/Gh genotype were compared with those of accessions with the Gb/Gb or Gb/Gh genotype. The significance of difference was analyzed with a two-tailed t-test.

Conclusions
In the present study, we highlighted a batch of genes preferentially expressed during fiber elongation and SCW biosynthesis, and thus further revealed the genetic basis underlying high-quality fiber formation. By comparative transcriptome analysis between G. barbadense and G. hirsutum, we finally identified 42 elongation-related and 48 SCW biosynthesis-related candidate genes, whose potential roles were then confirmed by ectopic overexpression and SNP marker-based BC 3 F 5 population analysis, respectively. These findings not only provide valuable information for the understanding of cotton fiber development, but also credibly contribute to cotton breeding for fiber quality improvement.

Data Availability Statement:
The data presented in this study are available within the article and the supplementary materials. The raw RNA-seq data have been deposited in the Genome Sequence Archive (https://bigd.big.ac.cn/gsa; PRJCA014806 (accessed on 26 April 2023)) and are available from the corresponding author on reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.